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SRAM Module 

Mean Time Between Failure Analysis 

(MTBF) 

Introduction 

In general terms, the reliability of any plastic module assembly can be assessed by 
dividing the assembly into 4 critical areas - active devices, passive devices, 
substrates and interconnects. 

Each is reviewed separately below: 

Active Device Reliability 

Basically this is determined by the inherent reliability of the active components 
used and by the way this inherent reliability may be degraded during assembly 
and use. 

♦ Inherent Component Reliability 

Inherent active device reliability is determined by the original device manufacturer. 
Manufacturers carry out reliability tests (reliability monitors) on plastic product as 
an ongoing process monitor which allows component FIT (failure-in-time) rates to 
be calculated. As the part matures and more data is accumulated, the confidence 
level in the calculated FIT rate increases. 

♦ Factors Effecting Inherent Component Reliability 

In order to avoid degrading the inherent reliability of plastic components the 
following factors need to be considered: 

1. Moisture absorption 

Plastic Encapsulated Microcircuits can absorb moisture if not stored, handled or 
used properly. Moisture absorbed during storage or manufacture can lead to the so 
called "popcorn effect 11 (i.e., fracturing of the plastic encapsulant due to the rapid 
expansion of entrapped moisture) during the soldering process. Any ionic 
contaminants remaining after the manufacturing process or deposited during field 
usage may cause corrosion of internal metal surfaces. In addition, long-term 
moisture intrusion can mobilize residual ionic materials, initiating or accelerating 
this corrosion. 
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However results produced by Plastic Microcircuit suppliers suggest decades of use 
before moisture penetrates to the die (provided the encapsulant is intact). 

Improvement in resin performance has tended to offset the degradation in 
moisture resistance as the resin coatings decrease in thickness as packages get 
smaller. 

2. Mechanical or thermal overstressing during processing 

Plastic packages can be damaged by exposure to high temperatures. Therefore the 
processing temperatures used in module assembly should be as low, and of as 
short a duration, as possible (and certainly within the component manufacturers 
guidelines). 

3. Power dissipation 

The reliability of any semiconductor device is heavily dependent on the junction 
temperatures reached during operation - the higher the temperature the lower the 
reliability. Therefore self heating effects and surrounding ambient temperatures 
should be controlled to keep the junction temperature below 150°C. 

4. Soft error rate 

In plastic devices, alpha particles can be generated by impurities in the resin. 
Levels tend to be low and buffer coatings between the die and resin absorb most of 
the particles. In modules it would be expected that the soft error rate should be no 
worse than that of the plastic packaged devices used. 

If it is assumed that module processing does not degrade the component inherent 
reliability, manufacturers published test data can be used to calculate component 
FIT or MTBF. Hie method used for calculating these figures is described in 
Appendix 2. 

Passive Device Reliability 

The only passive devices used on plastic modules are ceramic chip capacitors used 
for de-coupling purposes. These devices are procured against MIL specifications 
and assembled to substrates using standard surface mount processes. 

The contribution that passive components will make to module MTBF figures is 
several orders of magnitude less than the active components and they can usually 
be ignored. 

Substrate Reliability 

Substrates generally used for modules are Printed Circuits Boards (PGB*s) 
Typically the boards are 6 layer multi-layer (4 signal, Pwr and Gnd) with FR4 base 
laminate. Provided these boards are operated within their specification widow, 
their potential for failure is minimal. 

Interconnect Reliability 

Solder joints provide the majority of mechanical connection strength and thermal 
and electrical conduction paths within modules as they are used to connect 
components and leadframe to the substrate. Reliability of these joints is influenced 
by a variety of factors identified below: 
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♦ Joint configuration 

The designed shape of the surfaces to be joined and the amount of solder deposited 
determine the joint configuration. Established design rules can be applied. In 
addition product design should ensure that joints are not subjected to undue 
mechanical stress, for example due to CTE mismatch. 

Physical characteristics of the selected joining alloy (after processing) will inevitably 
determine joint reliability under mechanical stress. 

♦ Processing 

Controlled processing is required to ensure the alloy retains its expected physical 
characteristics after the joint is formed. 

♦ Substrate finish 

Metallisation on the joint surfaces can have a significant impact on joint reliability. 

♦ Contamination 

Contamination on joint surfaces before soldering will limit joint quality. Flux 
residues need to be fully removed to avoid corrosion and electrical leakage 
problems. Processes need to be set up to ensure all contamination is removed. The 
effectiveness can be checked by visual inspection, ionograph measurements 

♦ Thermal mismatch 

If the materials in the assembly are not carefully matched for TCE (Temperature 
Coefficient of Expansion), slow or fast thermal cycling could cause significant 
damage to the solder connection. 

♦ Mechanical strength 

The method used to attach the components to the substrate need to be robust 
enough to withstand mechanical stresses due to shock, acceleration, vibration etc. 

If reliability figures for PCB and interconnection need to be included in module 
MTBF calculations, the methodology described in MIL-HDBK-217F Notice 2 can be 
used. 

SUMMARY 

By correct application of design rules, careful component selection and stringent 
control of processes it can be seen that the reliability of a module is largely 
dependent on the inherent reliability of the active components used. Secondary 
factors are solder joint reliability and substrate (i.e. PCB) reliability. 

An estimate of module reliability (MTBF) can be obtained by :- 

1. Calculating active device failure rate using manufacturer's test data. 

2. Calculating the Interconnection assemblies failure rate using MIL-HDBK-217F 

3. Summing the individually calculated failure rates from above. 

Note that the more that is known about the operating environment and conditions, 
the more accurate the calculated value. 
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Appendix 1 - Calculation of Semiconductor Failure Rates 

One of the fundamentals of understanding a product's reliability requires an 
understanding of the calculation of the failure rate. The traditional method of 
determining a product's failure rate is through the use of accelerated high 
temperature operating life tests performed on a sample of devices randomly 
selected from its parent population. The failure rate obtained on the life test sample 
is then extrapolated to end-use conditions by means of predetermined statistical 
models to give an estimate of the failure rate in the field application. 

Although there are many other stress methods employed by semiconductor 
manufacturers to fully characterize a product's reliability, the data generated from 
operating life test sampling is the principal method used by the industry for 
estimating the failure rate of a semiconductor device in field service. 

Table 1 gives definitions of some of the terms used to describe the failure rate of 
semiconductor devices. 



Tfcuk It FAILURE RATE PRIMER. 



TERMS 


DEFINITIONS/DESCRIPTIONS 


Failure Rate (A.) 


Measure of failure per unit of time. The useful life failure rate is based oo the 
cxpoodiik] life disiribu Lioa The failure rate typically decreases slightly over 
early life> then stabilizes until wear-out which shows an increasing failure 
rale. This should occur beyond useful life. 


Failure la Tune (FIT) 


Measure of failure rale in 10 9 device boors; e.g. 1 FIT* 1 future in 10? 
device hours. 


Total Device Hows (TDH) 


The summation of the Dumber of units to operation multiplied by the lime of 
OpCiotluu. 


Mean lime To Failure (MTTF) 


Mean of the life distribution for the population of devices under operation or 
expected lifetime of an individual MTTF *1/X t whkb is the time wbac 
63.2% of thepofjuladon has failed. Example: For X* 10 FITk. MTTF ■ l/X* 
100 million hours. 


Confidence Lewi cr Limit (CL) 


Probability level at which population failure me estimates arc derived from 
sample life lest. The upper confidence level interval Ss used. 


Acceleration Factor (AF) 


A constant derived from experimental data which relates the times to failure at 
two different stresses. The AF allows extrapolation of failure rates from accel- 
erated test conditions to use conditions. 



A simple failure rate calculation based on a single life test would follow equation 1. 



X - failure rate. 

TDH = Total Device Hours = Number of units x hours tinder stress. 
AF = Acceleration factor, see Equation 3. 
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Since reliability data can be accumulated from a number of different life tests with 
several different failure mechanisms, a comprehensive failure rate is desired. The 
failure rate calculation can be complicated if there are more than one failure 
mechanisms in a life test, since the failure mechanisms are thermally activated at 
different rates. Equation 1 accounts for these conditions and includes a statistical 
factor to obtain the confidence level for the resulting failure rate. 



MxlO 9 
~P 



(Eq-2) 



where, 

X = failure rate in FITs (Number fails in 10 9 device hours) 

P = Number of distinct possible failure mechanisms 
k = Number of life tests being combined 

x i = Number of failures for a~given failure mechanism i = 1, 2,... |3 

TDH j = Total device hours of test time for life test j, j = 1, 2,... k 

AF ij = Acceleration factor for appropriate failure mechanism, i = 1, 2,... k 

where, 

X 2 = chi square factor for 2r + 2 degrees of freedom 

r = total number of failures (X x i ) 

OC = risk associated with CL between 0 and 1. 

In the failure rate calculation, acceleration factors (AF ij ) are used to derate the 
failure rate from the thermally accelerated life test conditions to a failure rate 
indicative of actual use temperature. The acceleration factor is determined from the 
Arrhenius equation. This equation is used to describe physio-chemical reaction 
rates and has been found to be an appropriate model for expressing the thermal 
acceleration of semiconductor device failure mechanisms. 



(Eq. 3) 



where, 

AF = Acceleration Factor 
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E a = Thermal Activation Energy (Table 2) 
k = Boltzmann's Constant (8.63 x 10 -5 eV/K) 

T use = Use Temperature (°C + 273) 

T stress = Life test stress temperature (°C + 273) 

Both T use and T stress (in degrees Kelvin) need to include the internal 
temperature rise of the device to represent the junction temperature of the chip 
under bias. 

Failure rates for commercial, industrial and military applications are generally 
published at 55°C with a 60% CL within the semiconductor industry. Critical 
system applications sometimes specify a 90% or 95% CL at 55°C or 125°C 

The thermal activation energy (E a ) of a failure mechanism is determined by 
performing tests at a minimum of two different temperature stress levels. The 
stresses will provide the time to failure (t f ) for the two (or more) populations, thus 
allowing the simultaneous solution for the activation energy as follows: 



(Eq.4) 
(Eq.5) 



By subtracting the two equations, and solving for the activation energy, the 
following equation is obtained. 



Ax 



(Eq.6) 



Table 2 below lists several different failure mechanisms, their cause, and the 
activation energy associated with each. If no failure is recorded for the sample on 
life test the default activation energy is 1.0 eV, For an unknown failure mechanism 
an activation energy of 0.7 eV is assumed. Also listed is a possible screen to find the 
failure mechanism and how to control the problem if it occurs. 



Table 2: FAILURE MECHANISM 



Fe 



Failure 
Median km 


Activatkm 
Energy 


Screening and Testing 
Methodology 


Control Methodology 


Oxide Defects 


O3-0.5eY 


High Temperature operating life 
(HTOL) and voltage stress. 


Statistical Process Coram*] of oxide 
parameters* defect density control and 
voltage stress testing. 


Silicon Defects 
{Bulk) 


03- 0.5c V 


HTDL and votage stress 
screens. 


Vendor statistical Quality Control pro- 
grams, and Statistical Process Control on 
thermal processes. 
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Example 

Here is a simple example of how the above equations can be used to calculate the 
failure rate from life test data. Assume that 600 parts where stressed at 150DC 
ambient for 3000 hours with one failure at 2000 hours for a photoresist flaw (0.7eV) 
and one failure at 3000 hours for an oxide defect (0.3eV); the internal temperature 
rise (T j ) of the part is 20DC and the product was tested at 1000, 2000 and 3000 
hours. We want to find the FIT rate for the process with a 95% CL at 55DC 

Table 2; FAILURE MECHANISM 



Failure 
Mechanism 


Activation 
Energy 


Screening and Testing 
Methodology 


Control Methodology 


Corrosion 


(U5cv 


Highly Accelerated Stress Test- 
mg (HAST). 


Passivation dopant control, hermetic seal 
txmifoL improved mold compounds, and 
product humxtUng. 


Assembly Defects 


0.5 - 0.7eV 


Temperature cycling, tempera- 
ture and nKctonecal shock, and 
environmental stressing^ 


Vendor statistical Quality Control pro- 
grams. Statistical Process Control of 
assembly processes, and proper handling. 


Elccuwnigraticw 

-AlUne 

- Contact/Via 


OjUcV 
0.9eV 


Test vehicle chiiracteri/tttioas at 
highly elevated temperatures. 


Design process grout Ktrutes to match 
measured data, statistical control of met* 
als, photoresist and passivation. 


Mask Defect*/ 

Photoresist 

Defects 


07cV 


Mask Rib comparisons, prim 
checks,, defect density monitor in 
Fab* vohagc stress test and 
HTOL. 


Clean room control, etc ait mask, pellicles. 
Statistical Process Control of photoresist/ 
etch processes. 


Comwru nation 


I.OcV 


C-V stress of oxides* wafer lab 
device stress test and IITOU 


Statistical Process Control of C-V data* 
oxidc/frttereoruKci cleans, high Integrity 
^tiissivfition and clean assembly process. 


Charge Injection 


U3cV 


IITOL and oxide characteriza- 
tion. 


Design groundrules based on lest results* 
wafer level Statistical Process Control of 
gate length and control of gate oxide 
thickness. 



Using the Arrhenius relationship (Eq. 3), the acceleration factors are computed as 
follows: 



AF . = exp\ — — * (i--fk)l = 143,2 Photoresist flaw (Eq. 7) 

L8.63 x 10"' 348 443 J 

0.3 



AF 2 s exp [ — — — ^ - i)l = 8.52 Oxide defect (Eq. S) 

2 L8.6SX10- 5 34 « 443 J 



X 2 - 12.6, which is dependent on the degrees of freedom (2r + 2) = 6 for r = 
2 failures and a = 95 / 100 = 0.95 for CL = 95%. 

M is then simply X 2 /2 = 6.3. 

The total device hours (TDH) is derived from the summation of the devices 
on stress multiplied by their test duration. 



TDH ^ 600 x 1000 + 599 x 1 000 + 598 x 1 000 = 1 .797 x 1 (flwur s 
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From Eq. 2 , X in FITs is computed as follows: 

X = ( L + i-; ) x ( , 1U ) = 218 FIT* <nq.9) 

1.797 x 10* x 148.2 1.797 XlO^X 8.52 2 



Note that for a 60% CL, the X 2 = 6.2, which yields 107 FITs. 
The MTTF can be calculated from the reciprocal of the 

MTTF = ( JL) x 10* = 4.59 x 10 6 hours # (Eq. 10) 

MTTF = (jL-) x 10* = 9.35 x 10 6 horn @ 60%CL (Eq. 1 1) 
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Appendix 2 - Examples 

SYS32256LK-020 MTBF 

The SYS 32256LK is a plastic 8Mbit Static RAM ( organised as 256K x 32) in a 64 
SIMM footprint. Bill of Material is as follows :- 

♦ 8 off 256K4 SRAM in 28 SO] ( e.g. Samsung KM641001 AJ ) 

♦ 1 off 6 layer Multi PCB FR4 Laminate (240 Surface Mount soldered 
connections.) 

♦ 8 off OluF Multilayer Ceramic Chip Capacitors. 
Active Component Reliability 

Assumption is that the operating condition will be 55°C ambient with 5.5V supply 
and confidence level applied will be 95%. The internal temperature rise of the 
component due to self heating ( Tj) is estimated at 20°C. 

The calculated reliability depends upon the components used to build the module. 
For the purposes of this calculation, data for the Samsung KM641001AJ has been 
used. With reference to the method described in Appendix #1 :- 

Test Data ( Samsung reliability monitor) : 

1000 pes stressed at 7V, 125°C, 96 Hrs with 0 failures 

387 pes stressed at 7V, 125°C, 1008 Hrs with 0 failures 

Assuming a 0.5eV activation energy, the acceleration factors are calculated as 16.24 
( for temperature ) and 31.6 ( for voltage ) 

Therefore the total number of device hours is 

(1000x96 + 387x1008) x 31.6 x 16.24 « 2.49 x 10* device hours 

The X 2 /2 value for 0 failures at a 95% confidence level is 2.996. 

Therefore the FIT rate is 2.996/2.49x108 = 12 FIT 

The MTBF can be calculated from the reciprocal of the FIT rate multiplied by 10 9 
ACTIVE COMPONENT MTBF = 8Zxl& hours ® 95%CL 

Feb & Interconnection Reliability 

Following the calculation in appendix #2 with the following values :- 
4 d=*325mils(28SOJ400mil) 

♦ h = 5 mils 

♦ oc s = 20 ( FR4 Multilayer) 

♦ AT = 21 ( Ground, fixed) 

♦ <*CC = 7 (plastic) 

♦ Trise =20°C 

♦ 71^ = 150 (J lead) 
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♦ CR = 0.021 cycles/hr (Industrial) 

♦ Design life = 20 years 

Therefore, inserting values and calculating 
Nf = 60,836 thermal cycles to failure. 

^smt = 60,836/0.021 = 2,892,000 hours 

LC/~swr= (20x8760)/2,892,000 = 0.061 
Therefore from the table ECF = 0.13 

Asmt = 013/2892000 = 4.5 x 10* failures per hour 

Xsmt = 45 FIT 

The MTBF can be calculated from the reciprocal of the FIT rate multiplied by 10 9 
PCWNTERCONHECT MTBF = 3/45 xW = 22xl& hours 

Passive Device Reliability 

The contribution of passive devices ( i.e. capacitors ) is negligible and can be 
ignored. 

SYS32256LK-020 Reliability 

The total module reliability is given by the sum of all of the FIT rates of 
contributing elements :- 

Active components -8x12 FITs 

PCB/Interconnections = 45 FITs 

Therefore SRAM Module = 141 FITs 

SYS32256LK-020 MTBF = 7 x 10« hours (800 years) 



February 1999 



Page 10 



Hybrid Memory Products Ltd 



SRAM 



SYS32128LK-020 MTBF 

The SYS 32128LK is a plastic 4Mbit Static RAM ( organised as 128K x 32) in a 64 
SIMM footprint. Bill of Material is as follows 

♦ 4 off 128K8 SRAM in 32 SOJ (e.g. Samsung KM681001 AJ) 

♦ 1 off 6 layer Multi PCB FR4 Laminate (136 Surface Mount soldered 
connections.) 

♦ 4 off O.luF Multilayer Ceramic Chip Capacitors. 
Active Component Reliability 

Assumption is that the operating condition will be 55°C ambient with 5.5V supply 
and confidence level applied will be 95%. The internal temperature rise of the 
component due to self heating ( Tj) is estimated at 20°C. 

The calculated reliability depends upon the components used to build the module. 
For the purposes of this calculation, data for the Samsung KM681001AJ has been 
used. 

The KM681001AJ essentially uses the same die type as the KM641001AJ above ( 
with just a 2 nd layer metal pattern change) Therefore the estimated MTBF should be 
the same :- 

ACTIVE COMPONENT MTBF = 83x10 s hours @ 95%CL 
Pcb & Interconnection Reliability 

The PCB used for this module is similar to the PCB above. The only difference is 
the use of 32 SOJ instead of 28 SOJ components. As a result the calculated figure is 
slightly worse because of a higher value for the d parameter. 

d = 375 mils ( 32 SOJ 400 mil ) 

Working through the calculation in the same way gives :- 
Xsmt= i 45 FIT 

The MTBF can be calculated from the reciprocal of the FIT rate multiplied by 10 9 
PCB/INTER CONNECT MTBF = 16 x 10 s hours 

Passive Device Reliability 

The contribution of passive devices ( i.e. capacitors ) is negligible and can be 
ignored. 

SYS32128LK-020 Reliability 

The total module reliability is given by the sum of all of the FIT rates of 
contributing elements:- 

Active components - 4 x 12 FITs 

PCB/ Interconnections = 62 FITs 

Therefore SRAM Module = 110 FITs 

SYS32128LK-020 MTBF = 9 x 106 hours (1037 years 
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